Clustering Co-occurrence of Maximal Frequent Patterns in Streams
نویسندگان
چکیده
One way of getting a better view of data is by using frequent patterns. In this paper frequent patterns are (sub)sets that occur a minimal number of times in a stream of itemsets. However, the discovery of frequent patterns in streams has always been problematic. Because streams are potentially endless it is in principle impossible to say if a pattern is often occurring or not. Furthermore, the number of patterns can be huge and a good overview of the structure of the stream is lost quickly. The proposed approach will use clustering to facilitate the “online” analysis of the structure of the stream. A clustering on the co-occurrence of patterns will give the user an improved view on the structure of the stream. Some patterns might occur so often together that they should form a combined pattern. In this way the patterns in the clustering will approximate the largest frequent patterns: maximal frequent patterns. The number of (approximated) maximal frequent patterns is much smaller and combined with clustering methods these patterns provide a good view on the structure of the stream. Our approach to decide if patterns occur often together is based on a method of clustering where only the distance between pairs of patterns is known. This distance is the Euclidean distance between points in a 2-dimensional space, where the points represent the frequent patterns, or rather the most important ones. The coordinates are adapted when the records from the stream pass by, and reflect the real support of the corresponding pattern. In this setup the support is viewed as the number of occurrences in a time window. The main algorithm tries to maintain a dynamic model of the data stream by merging and splitting these patterns. Experiments show the versatility of the method.
منابع مشابه
Displaying Co-occurrences of Patterns in Streams for Website Usage Analysis
One way of getting a better view of data is by using frequent patterns. In this paper frequent patterns are (sub)sets that occur a minimal number of times in a stream of itemsets. However, the discovery of frequent patterns in streams has always been problematic. Because streams are potentially endless it is harder to say if a pattern is frequent or not. Furthermore, the number of patterns can ...
متن کاملClustering Co-occurrences of Maximal Frequent Patterns in Streams
One way of getting a better view of data is by using frequent patterns. In this paper frequent patterns are (sub)sets that occur a minimal number of times in a stream of itemsets. However, the discovery of frequent patterns in streams has always been problematic. Because streams are potentially endless it is in principle impossible to say if a pattern is often occurring or not. Furthermore, the...
متن کاملA Study on Distributed Frequent Co-occurrence Patterns Algorithms across Multiple Data Streams
With the era of big data coming, the data streams are fast, continuous, and unbounded. The real-time requirements of the data streams processing results are very high. A large number of researches have been on Frequent Co-occurrence Patterns across multiple data streams. But those algorithms are centralized, which is worked on a single compute node. The memory of a single compute node and CPU c...
متن کاملMining Frequent Co-occurrence Patterns across Multiple Data Streams
This paper studies the problem of mining frequent co-occurrence patterns across multiple data streams, which has not been addressed by existing works. Co-occurrence pattern in this context refers to the case that the same group of objects appear consecutively in multiple streams over a short time span, signaling tight correlations between these objects. The need for mining such patterns in real...
متن کاملMining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows
Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/0705.0588 شماره
صفحات -
تاریخ انتشار 2007